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Scientific  Progress 

Two  classes  of  parallel  solvers  have  been  developed.  The  first  is  a  family  of  parallel  sparse  linear  system  solvers  -  the 
PSPIKE  family  (Spike/Pardiso  hybrid  solvers).  The  second  is  a  family  of  Trace  Minimization  eigensolvers  for  the  symmetric 
standard  and  generalized  eigenvalue  problems.  Both  classes  of  solvers  proved  to  be  competitive  with  existing  solvers  in  being 
more  robust  and  more  scalable  on  parallel  architectures  with  superior  speeds.  In  particular  the  TraceMin  eigensolvers 
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Symmetric  Generalized  Eigenvalue  Problem 

Ax  =  A  Bx 


A  is  sparse  and  symmetric 
B  is  sparse  and  s.p.d. 

Obtain  the  p  eigenvalues  closest  to  0  and 
their  corresponding  eigenvectors >  p  «  n 


The  Trace  minimization  scheme 


min  tr(YTAY )  =  E  A, 

YT  BY = I  i= 1 

P 

XL1  —  XL2  —  —  Xlp  'lp  + 1  —  —  xl/7 

Y(=Rn><P  ;  P«n. 


AS.  &  J.  Wisniewski:  SINUM,  1982 
A.S.  &  Z.  Tong:  J.  Comp.  Appl.  Math.,  2000. 


Extensions  of  this  optimization  problem 

for  definite  matrix  pair 

•  B:  indefinite  nonsingular 
-Kobac-Striko  and  K.  Veselic  (1995) 

•  B:  indefinite  singular 
—  Liang ,  Li,  and  Z.  Bai 


A  and  B  are  called  " definite  pair"  if  there 
exists  a  scalar  p  such  that  (A- p  B) 

is  s.p.d ,  or  s.p.s.d 
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solve 


(  a  BYk\/Yk-Ak\jo\ 

[y^B  o)\-Lk)-{lp)  ** 

•  (Yu  -  AJ  is  used  to  generate  Yu, , 

•  tr  (Yk+1  A  Yk+1)  <  tr(Yk  A  Yk) 

•  Different  schemes  (direct/iterative)  for  solving 
(*)or(**) 

•  TraceMIN  does  not  reguire  low  relative 
residuals  in  solving  (*)  or  (**) 


a)  Using  a  direct  sparse  linear  system 

solver  (Pardiso) 

*  Solve 

AZk  =  BYt  for  Zk 

•  Solve 


I  ZA/Yt-  AA 

Y'B  Ojl  ~L 


for  (Yt-  At) 


b)  Use  a  Krylov  subspace  method  to 
solve 

(I-Pk)A(I-Pk)Ak  =  (I-Pk)AYk 


for 


where 

Pt=BYt(Y[B2Yk)-'YkTB. 


e.g.  CG,  or  Minres;  rel.  res  <  10~6 


c)  Solve 


t  A  BY\(Y- A\ 

\ytb  o  Uf 


via  a  Krylov  subspace  method  with  the 
preconditioner 

M  =  diag  (A ,  YTBA1BY) 

A  is  an  approximation  of  A 


Murphy Golub ,  Wathen:  SISC 1999 


TraceMIN  algorithm 

Choose  initial  (n  x  s)  block  V;  s  ~  2p 
Repeat  until  convergence 

—1.  B-orthonormalize  V 
—2.  Form  approximate  eigenpairs 
YT A  Y  =  I,  YT  B  Y  =  I;  I  =  diag  (op  o2, os) 
—3.  R  =  AY-BYI,  check  norm(rk )  /  ok<  10  5 
—4.  Solve  saddle-point  problem  for  A  or  Y- A 
—5.  update  V  =  Y  -  A 


Convergence 

•  Convergence  bounded  by  \  Ak/As+1  \ 

•  Origin  shifts  can  lead  to  faster  convergence 
-(A  -  nk  B)  xk  =  (Ak  -  nk)  xk 
-\(Ak-Hk)/Astl\ 

•  If  A  is  s.p.d,  then 

®k  =  (yk-xk)TA(yk-xk) 
is  reduced  asymptotically  by  a  factor  of 

(K/Ki)2 
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Simultaneous  Iterations  &  TraceMIN 


•  Simultaneous  Iterations  is  mathematically 
equivalent  to  TraceMIN. 

•  In  each  step  of  Simultaneous  Iterations,  we 
need  to  solve  systems  of  the  form  A  V  =  BY. 

•  //  we  solve  such  systems,  as  well  as  TraceMIN 
saddle-point  problems  using  Minres,  for 
example,  with  a  modest  stopping  criterion,  i.e. 
rel.  res.  <  0.5,  we  get  vastly  different  behavior 
in  reduction  of  the  trace. 


n 


10* 


12 


Three  trace  minimization  schemes  for 
the  symmetric  eigenvalue  problem 


Solver 

TraceMIN 

1982 

Jacobi-Davidson 
(JD  - 1996) 

TraceMIN- 

Davidson 

(TD-2000) 

Subspace 

dimension 

constant 

expanding 

expanding 

Choice  of  conservative  aggressive  conservative 
shifts 


How  to  handle  expanding  subspaces 

V  :=  [V,  A] 

Upon  restart,  retain  only  relevant  part  of  the  subspace 


Performance  Results:  a  sample 


Codes  in  Tri linos  framework 


TraceMIN  &  TraceMIN-Davidson  (TD)  vs: 

1.  The  DOE  Tri  linos  Anasazi  package  - 

—  Krylov  Schur  (KS)  [G.W.  Stewart];  (BKS)  [Saad] 

—  Locally  Optimal  Block  Preconditioned  Conjugate 
Gradient  ( LOBPCG ),  [Knyazev] 

—  Riemannian  Trust  Region  (RTR)  **  [Baker  et  al] 

2.  The  SLEPc  package  -  Jacobi-Davidson  (JD)  ** 

**  TraceMIN-based 


Robustness 

•  Obtain  the  4  smallest  eigenpairs  of  the 
standard  eigenvalue  problem:  Ax-Ax 

•  Source:  Tim  Davis  sparse  matrix  collection 

•  3  benchmarks  for  which  A  is  symmetric  positive 
definite 

•  1  benchmark  for  which  A  is  indefinite 

•  ✓  :  success 

•  X:  failure  (no  convergence  or  wrong  multiplicity) 
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Computing  Platform 

•  Intel  cluster:  Endeavor 

•  Infiniband  interconnection  network 

•  Each  node  consists  of  two  14-core  Intel 
processors  running  at  2.6  GHz 

•  64  GB  per  node 

•  Hybrid  programming  paradigm  MPI/OpenMP 
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Robustness 


Matrix 

Type 

n 

TD 

BKS 

LOBPCG 

RTR 

KS 

JD 

Poisson 

spd 

64  M 

✓ 

✓ 

X 

✓ 

X 

X 

Flan_1565 

spd 

1.5  M 

✓ 

X 

X 

✓ 

X 

✓ 

Hook_1498 

spd 

1.5  M 

✓ 

X 

X 

✓ 

X 

✓ 

nlpkkt240 

Indef. 

28  M 

✓ 

X 

X 

X 

X 

✓ 

SLEPc 


ANASAZI 


T ime( algorithm  X)  /  Time(TD) 

•  32  nodes  (24  cores  per  node) 


•  128  nodes  (24  cores  per  node) 


Flan_1565  spd  1.5  M  i  Q  *  *  1.3  X  2.1 


nlpkkt240 

indef 

28  M  I  Q 

X 

X 

X  X  6.6 

18 


8  16  32  64  128 


computing  the  4  smallest 
eigen  pairs  of  nlpkkt240 


nodes 
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TraceMIN:  multisectioning  &  sampling 

(codes  in  Fortran  90) 

Multisectioning: 

—For  obtaining  a  large  number  of  interior 
eigenvalues  and  their  corresponding 
eigenvectors 

Sampling: 

—Obtaining  several  eigen  pairs  in  the 
neighborhood  of  a  given  number  of 

points  within  the  spectrum 
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Multisectioning 

•  Using  several  shifts  a  we  can  obtain  that 
interval  (0,t)  that  contains  the  number  of 
eigenpairs  desired. 

•  Divide  the  interval  under  consideration  into 
many  subintervals,  one  subinterval  per  node; 
each  containing  roughly  the  same  number  of 
eigenvalues  (e.g.  20),  if  possible 

•  For  each  node  j,  we  use  our  shared-memory 
TraceMIN  eigensolver  to  obtain  the  smallest 
eigenpairs  of  (A-  a,  B)  x  =  ( A-  a  )  B  x. 
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With  the  shifts  a ,  we  get  saddle-point 
problems  of  the  form: 

-A*WO\ 

A  sparse  direct  solver  (Pardiso)  is  used  to 
Obtain  the  factorization  of  the  indefinite 
(1,1)  block: 


(A  —  ciB 


Y'B 


BYk\ 


O 


y 


Yk 


A  —  clB  =  LDL1 . 


Multisectiofling  -  an  illustration 

•  Obtain  the  eigenpairs  corresponding  to  [a,  (3] 

•  [a,  |3]  is  divided  into  4  sub-intervals 

•  Use  five  nodes 

•  Except  for  Node  # 1 ,  each  node  performs  2  factorizations: 

—  one  for  computing  the  inertia  (blue),  and 

-  one  for  the  TraceMIN  iteration  (red). 

•  Node  1  performs  one  factorization  with  shift  a 

a  p 

. 

t  |  till 

[1]  [2]  [2]  [3]  [3]  [4]  [4]  [5]  [5] 
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A  NAS  TRAN  benchmark 


Car  body  dynamics  - 

—A  x  =  A  B  x  (order:  1.5  M,  and  7.2  M) 

—A=A\  B  :=  s.p.d. 

—  norm(R,inf)  <  10~6 

-Compute  the  smallest  1000 
eigenpairs  using  129  nodes. 
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Sparsity 
Structure 
of  A 

n~7.2  M 
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TraceMINw /  multisectioning 
vs.  Intel ’s  Math  Kernel  Library ’s  FEA  S  T 


size  #  of  FEAST 

smallest  (seconds) 

As 


Nastranl  1.5  M  1000  59  121  2.2 


Nastran  2  7.2  M  1000  418  Failure 


Speed 

Improve. 


Problem 


All  Trilinos  eigensolvers  failed  on  this  NASTRAN  benchmark 


TraceMINw /  multisectioning 

vs.  Intel ’s  Math  Kernel  Library ’s  FEA  ST... 
using  129  nodes;  norm  (rk ,  inf}/ crk<  10  5 


Problem 

size 

#  of  As 
interval 

TraceMIN 

(seconds) 

FEAST 

(seconds) 

Speed 

Improve. 

Anderson  * 

1.0  M 

1143 
[-.01,.  01] 

792 

7910 

10.0 

af_shelllO  1.5  M  1045 

[2000,2250] 

37  274  7.4 

•Anderson  Model  of  Localization  (Schrodinger  eq.):  H  x  =  Ax 

diag  (H):  random;  offdiag(H)  =  1 
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Sampling:  computing  400  eigenpairs 


Matrix 

Size 

Interval 

abs.  res. 

10  5 

abs.  res. 

10 -9 

NASTRAN 1 

1.5  M 

[-.01, 103] 

21s 

40  s 

05  s/ep 

~  .10  s/ep 

NASTRAN  2 

7.2  M 

[-.01,  3  *  106] 

181  s 

302  s 

~.45  s/ep 

76  s/ep 

Obtain  the  4  eigenvalues  closest  to  100  points  uniformly 
distributed  in  each  interval  and  the  corresponding 
eigenvectors  using  TraceMIN  on  100  nodes 
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